AITopics | residual ratio

Collaborating Authors

residual ratio

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Improving Robustness of Foundation Models in Domain Adaptation with Soup-Adapters

Roschkowski, Marco

arXiv.org Artificial IntelligenceJul-9-2025

Computer vision has seen tremendous progress due to the emergence of deep learning technologies. Large supervised benchmark datasets such as ImageNet (Deng et al. 2009) have led to several methodological breakthroughs. These include overcoming traditional computer vision methods in (Krizhevsky et al. 2012), the introduction of skip connections in (He et al. 2016), advanced architectures such as inverted bottlenecks in (San-dler et al. 2018) and improved scaling techniques in (Koonce and Koonce 2021). A long-standing limitation has been the dependence on such large curated datasets which are expensive to obtain. Recently, the paradigm of foundation models has become an attractive alternative in which a single model is being trained on a corpus of data large enough to generalize well on several distinct downstream tasks. One notable vision foundation model is CLIP (Rad-ford et al. 2021) which learns a joint embedding space of images and their corresponding captions. This architecture naturally has the ability to perform zero-shot classification by describing visual categories via text prompts. Another popular foundation model is DINOv2 (Oquab et al. 2023) which has been trained on a large curated corpus of images to produce robust features. These models can be easily adapted for few-shot learning using KNN evaluation or prototypical learning (Snell et al. 2017).

artificial intelligence, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2507.05807

Genre: Research Report (0.64)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Radial Networks: Dynamic Layer Routing for High-Performance Large Language Models

Dotzel, Jordan, Akhauri, Yash, AbouElhamayed, Ahmed S., Jiang, Carly, Abdelfattah, Mohamed, Zhang, Zhiru

arXiv.org Artificial IntelligenceApr-7-2024

Large language models (LLMs) often struggle with strict memory, latency, and power demands. To meet these demands, various forms of dynamic sparsity have been proposed that reduce compute on an input-by-input basis. These methods improve over static methods by exploiting the variance across individual inputs, which has steadily grown with the exponential increase in training data. Yet, the increasing depth within modern models, currently with hundreds of layers, has opened opportunities for dynamic layer sparsity, which skips the computation for entire layers. In this work, we explore the practicality of layer sparsity by profiling residual connections and establish the relationship between model depth and layer sparsity. For example, the residual blocks in the OPT-66B model have a median contribution of 5% to its output. We then take advantage of this dynamic sparsity and propose Radial Networks, which perform token-level routing between layers guided by a trained router module. These networks can be used in a post-training distillation from sequential networks or trained from scratch to co-learn the router and layer weights. They enable scaling to larger model sizes by decoupling the number of layers from the dynamic depth of the network, and their design allows for layer reuse. By varying the compute token by token, they reduce the overall resources needed for generating entire sequences. Overall, this leads to larger capacity networks with significantly lower compute and serving costs for large language models.

layer sparsity, residual ratio, sparsity, (16 more...)

arXiv.org Artificial Intelligence

2404.049

Country: Asia > Middle East > Jordan (0.05)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback